This tutorial was originally published on DataCareer.
In this Jupyter Notebook we will retrieve data from the European Central Bank (ECB). Before diving into the code, please take a quick look at the following websites, to get a feel for the data we will be dealing with:
EU portal: https://data.europa.eu/euodp/en/data/publisher/ecb
ECB SDMX 2.1 RESTful web service: https://sdw-wsrest.ecb.europa.eu/help/
SDMX documentation: https://sdmx.org
ECB statistics: https://www.ecb.europa.eu/stats/ecb_statistics/html/index.en.html
Statistical Data Warehouse: https://sdw.ecb.europa.eu
In this tutorial we will specifically take a look at foreign exchange (FX) rates, using Python 3. As always, let's start with importing some packages we will use for this exercise.
import requests # 2.18.4
import pandas as pd # 0.23.0
import io
In this notebook, we will retrieve the Euro / Swiss Francs (EURCHF) exchange rate time series from the year 2000 until today. To retrieve the data, we need to construct an URL which we can use in a HTTP request. Fortunately, this is pretty simple.
The query string is basically:
protocol://wsEntryPoint/resource/flowRef/key?parameters
And the parameters are defined as follows:
startPeriod=value&endPeriod=value&updatedAfter=value&firstNObservations=value&lastNObservations=value&detail=value&includeHistory=value
Let's break this down:
# Building blocks for the URL
entrypoint = 'https://sdw-wsrest.ecb.europa.eu/service/' # Using protocol 'https'
resource = 'data' # The resource for data queries is always'data'
flowRef ='EXR' # Dataflow describing the data that needs to be returned, exchange rates in this case
key = 'D.CHF.EUR.SP00.A' # Defining the dimension values, explained below
# Define the parameters
parameters = {
'startPeriod': '2000-01-01', # Start date of the time series
'endPeriod': '2018-10-01' # End of the time series
}
Key (dimensions) explained:
For this example we only use two parameters, startPerdiod
and endPeriod
, but you can add more if you like.
Now we have to put all this together to construct the URL:
# Construct the URL: https://sdw-wsrest.ecb.europa.eu/service/data/EXR/D.CHF.EUR.SP00.A
request_url = entrypoint + resource + '/'+ flowRef + '/' + key
# Make the HTTP request
response = requests.get(request_url, params=parameters)
# Check if the response returns succesfully with response code 200
print(response)
# Print the full URL
print(response.url)
The request/response has been succesful. If you click on this link, it will download the data as a file. It doesn't have a useful extension, but you can open it as a text file if you would like to inspect it's contents. But of course we dan do this in Python too without leaving this notebook. Let's take a sneak peek at the data we received.
# Print the first 1000 characters of the response
print(response.text[0:1000])
Hmmm, the response is in XML. Not impossible, but also not the easiest format to work with in Pandas. Fortunately the ECB's API let's us get the data in CSV format by specifying it in the header of the request.
# Make the HTTP request again, now requesting for CSV format
response = requests.get(request_url, params=parameters, headers={'Accept': 'text/csv'})
# Response succesful? (Response code 200)
print(response)
# Print the first 1000 characters to inspect the response
response.text[0:1000]
Excellent! Now we only need to load this response in a Pandas DataFrame. We can use 'StringIO' to read the strings as a file. This way we don't need to save it first and we can use it directly.
# Read the response as a file into a Pandas DataFrame
df = pd.read_csv(io.StringIO(response.text))
# Check the DataFrame's information
df.info()
You can see there are 4858 rows, which makes sense for almost 19 years of daily prices. There are 32 columns, which is probably more than we need. We can inspect the DataFrame to see which columns we need.
# Show the last 5 entries of the DataFrame
df.tail()
The columns we need are 'TIME_PERIOD' for the dates and 'OBS_VALUE' for the prices. Let's also do a sanity check on the prices in 'OBS_VALUE'.
# Inspect the prices. Do the mean, minimum and maximum make sense?
df['OBS_VALUE'].describe()
Now we will make a new DataFrame called 'ts' (for time series) with just the dates and EURCHF prices.
# Create a new DataFrame called 'ts'
ts = df.filter(['TIME_PERIOD', 'OBS_VALUE'], axis=1)
# 'TIME_PERIOD' was of type 'object' (as seen in df.info). Convert it to datetime first
ts['TIME_PERIOD'] = pd.to_datetime(ts['TIME_PERIOD'])
# Set 'TIME_PERIOD' to be the index
ts = ts.set_index('TIME_PERIOD')
# Print the last 5 rows to screen
ts.tail()
To conclude, we make a chart of the time series:
%matplotlib inline
ts.plot()
Very often, there are already convenient Python packages available on the internet you can use. For instance, if you search the internet for "Python ECB SDMX" you will undoubtly find the 'pandaSDMX' package. It is, as they call it, a "Python client to retrieve and acquire statistical data and metadata disseminated in SDMX 2.1, an ISO-standard widely used by institutions such as statistics offices, central banks, and international organisations."
Interestingly, pandaSDMX ships with built-in support for the following agencies:
More information can be found at: https://pandasdmx.readthedocs.io/en/latest/index.html
In this last section, we are going to retrieve the same data (EURCHF daily exchange rates) with this package. First though, you need to install the package via 'pip' (via the command line):
pip install pandasdmx
Now let's get the data!
from pandasdmx import Request
# Define the source
ecb = Request('ECB')
# Retrieve the data (we start at 2016, because are requesting a larger dataset (including other frequencies))
data_response = ecb.data(resource_id = 'EXR', key={'CURRENCY': ['CHF', 'EUR']}, params = {'startPeriod': '2016'})
data = data_response.data
# The data will be a pandaSDMC 'DataSet'
type(data)
# Show which frequencies are available ('D' is 'daily', you can probably guess the other ones)
set(s.key.FREQ for s in data.series)
# Filter the the daily data and 'write' it to a DataFrame
daily = (s for s in data.series if s.key.FREQ == 'D')
ts2 = data_response.write(daily)
ts2.tail()
%matplotlib inline
ts2.plot()
As you can see, the pandaSDMX pacakage is another easy way to retrieve the same data. Hopefully these examples will help you get started. Thanks for reading!